What makes a reaction network “chemical”?

Background Reaction networks (RNs) comprise a set X of species and a set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathscr {R}$$\end{document}R of reactions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y\rightarrow Y'$$\end{document}Y→Y′, each converting a multiset of educts \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y\subseteq X$$\end{document}Y⊆X into a multiset \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y'\subseteq X$$\end{document}Y′⊆X of products. RNs are equivalent to directed hypergraphs. However, not all RNs necessarily admit a chemical interpretation. Instead, they might contradict fundamental principles of physics such as the conservation of energy and mass or the reversibility of chemical reactions. The consequences of these necessary conditions for the stoichiometric matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S}\in \mathbb {R}^{X\times \mathscr {R}}$$\end{document}S∈RX×R have been discussed extensively in the chemical literature. Here, we provide sufficient conditions for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S}$$\end{document}S that guarantee the interpretation of RNs in terms of balanced sum formulas and structural formulas, respectively. Results Chemically plausible RNs allow neither a perpetuum mobile, i.e., a “futile cycle” of reactions with non-vanishing energy production, nor the creation or annihilation of mass. Such RNs are said to be thermodynamically sound and conservative. For finite RNs, both conditions can be expressed equivalently as properties of the stoichiometric matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S}$$\end{document}S. The first condition is vacuous for reversible networks, but it excludes irreversible futile cycles and—in a stricter sense—futile cycles that even contain an irreversible reaction. The second condition is equivalent to the existence of a strictly positive reaction invariant. It is also sufficient for the existence of a realization in terms of sum formulas, obeying conservation of “atoms”. In particular, these realizations can be chosen such that any two species have distinct sum formulas, unless \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S}$$\end{document}S implies that they are “obligatory isomers”. In terms of structural formulas, every compound is a labeled multigraph, in essence a Lewis formula, and reactions comprise only a rearrangement of bonds such that the total bond order is preserved. In particular, for every conservative RN, there exists a Lewis realization, in which any two compounds are realized by pairwisely distinct multigraphs. Finally, we show that, in general, there are infinitely many realizations for a given conservative RN. Conclusions “Chemical” RNs are directed hypergraphs with a stoichiometric matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S}$$\end{document}S whose left kernel contains a strictly positive vector and whose right kernel does not contain a futile cycle involving an irreversible reaction. This simple characterization also provides a concise specification of random models for chemical RNs that additionally constrain \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S}$$\end{document}S by rank, sparsity, or distribution of the non-zero entries. Furthermore, it suggests several interesting avenues for future research, in particular, concerning alternative representations of reaction networks and infinite chemical universes. Supplementary Information The online version contains supplementary material available at 10.1186/s13321-022-00621-8.

by assigning properties such as mass, energy, sum formulas, or structural formulas to the compounds. Similarly, reactions may be associated with rate constants, equilibrium constants, and so on. A formal theory of reaction networks (RN) describes a reaction on a given set of compounds X as a stoichiometric relation, i.e., as a pair of formal sums of chemical species x ∈ X: The left-hand side in Eq. (1) lists the educts and the right-hand side gives the products of the reaction. The stoichiometric coefficients s − xr ∈ N 0 and s + xr ∈ N 0 denote the number of species x ∈ X that are consumed (on the left-hand side) or produced (on the right-hand side) by the reaction r, respectively. A species x ∈ X is an educt in reaction r if s − xr > 0 and a product if s + xr > 0 . If s + xr = s − xr = 0 , then species x does not take part in reaction r and is suppressed in the conventional chemical notation. The formal sums x∈X s − xr x and x∈X s + xr x form the complexes of educts r − and products r + of the reaction r. We denote the set of reactions under considerations by R and call the pair (X, R ) a reaction network (RN). Throughout this contribution we will assume that both X and R are non-empty and finite. Excluding explicit catalysis, that is, forbidding s − xr s + xr > 0 , it suffices to consider the stoichiometric matrix S ∈ N X×R 0 . Its entries S xr = s + xr − s − xr describe the net production or consumption of species x in reaction r. In many practical applications, e.g. in the context of metabolic networks, RNs are embedded in an open system. In that manner, the consumption of nutrients and the production of waste can be modeled. We will return to this point only after discussing chemical RNs in isolation, i.e., as closed systems. Several graph representations have been considered as (simplified) models of a RN, see [1] for a recent summary. In contrast to the pair (X, R) , they do not always completely represent the RN.
The S-graph (species graph, compound graph, or substrate network in the context of metabolic networks) has the species as its vertices. A (directed) edge connects x to y if the RN contains a reaction that has x as an educt and y as a product [2,3]. The corresponding construction in the kinetic setting is the interaction graph with undirected edges whenever ∂[x]/∂[y] � ≡ 0 , which are usually annotated by the sign of the derivative [4]. S-graphs have also proved to be useful in approximation algorithms for the minimal seed set problem [5], which asks for the smallest set of substrates that can generate all metabolites. Complementarily, reaction graphs model reactions as nodes, while edges denote shared molecules [6].
The complex-reaction graph simply has the complexes C (the left-and right-hand sides of the reactions) as its vertex set and the reactions R as its edge set. That is, two complexes r − and r + are connected by a directed edge if there is a reaction r = (r − , r + ) ∈ R . Its incidence matrix Z ∈ R C×R (with entries Z cr = −1 if c = r − , Z cr = 1 if c = r + , and Z cr = 0 otherwise) is linked to the stochiometric matrix via S = YZ , where the entries of the (stoichiometric) complex matrix Y ∈ R X×C are the corresponding stochiometric coefficients. The complexreaction graph plays a key role in the analysis of chemical reaction networks with mass-action kinetics and arbitrary rate constants, as studied in classical "chemical reaction network theory" (CRNT) [7][8][9]. It gives rise to notions such as "complex balancing" and "deficiency", which allow the formulation of strong (global) stability results, see e.g. [10,11].
SR-graphs (Species-reaction networks) are bipartite graphs with different types of nodes for chemical species and reactions, respectively [12,13]. As such, they can be endowed with additional annotations or extended with multiple edges to represent stoichiometric coefficients. In this extended form, they are faithful representations of chemical RNs. Alternatively, the edges are often annotated with the multiplicities of molecules, i.e., the stoichiometric coefficients; in this case, they completely specify the RN (X, R) . Undirected SR-graphs have a close relationship to classical deficiency theory [7,9] and form the starting point for a qualitative theory of chemical RN kinetics [14]). More detailed information on qualitative kinetic behavior can be extracted from directed SRgraphs [15]. Both the S-and the R-graph can be extracted unambigously from an SR-graph.
The bipartite SR-graphs can be interpreted as the König's representation [16] of directed hypergraphs. The connection between hypergraph and graph representations is discussed in some more detail in [17]. While SR-graphs and directed hypergraphs can be transformed into each other, they carry a very different semantic. For instance, the notions of path and connectivity are very different for bipartite graphs and directed hypergraphs [18]. It has been argued, therefore that any graph representation of chemical networks necessarily treats edges as independent entities and thus fails to correctly capture the nature of chemical reactions [19,20]. In a similar vein, [21] adopts the hypergraph representation and models (bio)chemical pathways as integer hyperflows to ensure mass balance at each vertex. Not every pair of an S-and R-graph implies an SR-graph, and if they do, the result need not be unique [6].
Over the last decade, many authors, including one of us, have investigated metabolic networks from a statistical perspective and reached the conclusion that they are distictly "non-random", presumably as the consequence of four billion years of evolution. This conclusion is typically reached by first converting a RN into one of the graph representations mentioned above. The choice of graphs is largely motivated by a desire to place metabolic or other chemical RNs within the scheme of small world and scale free networks and to analyze the RNs with the wellestablished tools of network science [19,22]. Thus one concludes that graph-theoretical properties of metabolic networks are significantly different from the properties of randomly generated or randomized background models for chemical reaction networks [3,[23][24][25]. The insights gained from this "non-randomness" of metabolism, however, critically depend on what exactly the authors meant by "random", that is, how the background models are defined. In particular, it is important to understand whether differences between chemical networks and the background are caused by the implementation of universal properties (that any "chemistry-like" RN must satisfy) or whether they arise from the intrinsic structure of particular chemical networks.
To this end, however, we first need a comprehensive conception of what constitutes a chemistry-like reaction network. The different representations used in the literature highlight the fact that it is far from obvious which graphs or hypergraphs properly describe chemical RNs among a possibly much larger set of network models. There is a significant body of work in the literature that describes necessary conditions on the stoichiometric matrix S that derive from key properties of chemical RNs, such as the conservation of mass or atoms in each reaction [8,[26][27][28][29][30]. In contrast, we are interested here in sufficient conditions with the aim of providing a concise characterization of RNs (X, R) and their stoichiometric matrices S that describe reaction system that can reasonably be considered as "chemistry-like". This is of practical relevance in particular for the construction of artifical chemistry models [31][32][33][34] and random "chemistries": It is still an open problem how random RNs can be constructed that can serve as fair, chemistry-like background models. We therefore start with a brief survey of random artificial chemistries and randomized RNs. As we shall see in the following section, oftentimes no explicit provisions are made to include "chemical" constraints such as the conservation of matter and energy into the background models.
Beyond the practical importance for the generation of random chemistries, it is also of interest to ask whether and to what extent the stoichiometry of a RN constrains the underlying chemistry, i.e., the composition of compounds and the type of reactions. Chemical reaction networks have been studied as a paradigm of computation that is quite different from, but theoretically equally powerful as Turing machines [35][36][37][38]. In the case of DNA based computing [39], the field has matured to the point that a compiler for translating chemical reaction networks into nucleic acid strand displacement systems has become available [40]. If chemical reaction networks are to be used as computing devices, a necessary intermediate step is to design reaction systems that implement a given stoichiometric matrix. Constraints on the chemistry imposed by the desired network stoichiometry itself thus become an issue in the design process, prompting us to ask whether there are chemical limitations to the realizability of RNs also beyond the constraints imposed by thermodynamics.
The main part of this contribution is the characterization of chemistry-like RNs. Starting from the principles of energy conservation and conservation of matter, we derive equivalent conditions on the stoichiometric matrix S . We then introduce realizability of RNs by sum formulas and structural formulas as a first step towards a formalization of chemistry-like networks, and show that conservation of matter is already sufficient to guarantee the existence of such chemistry-like representations. Finally we discuss the consequences of the mathematical results for the construction of random RNs and address some open research questions.

A brief survey of random and randomized chemical RNs
Chemical reaction networks are specified either as a set of chemical reactions or as a system of differential equations describing its kinetics. Graphical models have been extracted from both.

Simple graph models of RNs
S-graphs have been used to explore statistical properties of large RNs. In this line of research, empirical S-graphs are compared to the "usual" random networks models such as Erdős Renyí (ER) random graphs, Small World networks in the sense of Watts and Strogatz [41], or the Álbert-Barabasi model of preferential attachment. Generative models for random graphs with given degree distributions were introduced in [42]. Not surprisingly, chemical reaction networks do not very well conform to either one of them. As noted early on, however, R-graphs of metabolic networks at least qualitatively fit the small world paradigm [22]. More sophisticated analyses detected evidence for modularity and hierarchical organization in metabolic networks [43], using random graph models with the same degree distributions as contrasts. Arita noted, however, that S-graphs are poor representations of biochemical pathways and proposed an analysis in terms of atom traces, concluding that "the metabolic world [of E. coli] is not small in terms of biosynthesis and degradation" [44]. The motivation to focus on atom maps comes from the insight that two compounds that are linked by reactions are only related by the chemical transformation if they share at least one atom.
A versatile generator for bipartite graphs that can handle joint degree distributions is described in [45]. Surprisingly, bipartite random graph models apparently have not been used to model chemistry. Instead of generative models such as the ER graph or the preferential attachment model, null models are often specified in terms of rewiring, that is, edit operations on the graph. Rewiring rules define a Markov Process on a set of graphs that can produce samples of randomized networks. The key idea is to specify the rewiring procedure in such a way that it preserves graph properties that are perceived to be important [46,47]. For example, the degrees of all vertices in a digraph are preserved when a pair of directed edges x 1 y 1 and x 2 y 2 is replaced by x 1 y 2 and x 2 y 1 as long as x 1 and x 2 have the same out-degree while y 1 and y 2 have the same in-degree. Randomization procedures for bipartite graphs have become available in the context of ecological networks [48] or trade networks [49]. To our knowledge they have not been used for SR graphs.

Random (directed) hypergraphs
In [50] a hypergraph is defined as a multiset of hyperedges, each of which in turn is a multiset of vertices. In this setting, a random hypergraph is specified by the probabilities p k to include a hyperedge e with cardinality |e| = k . Similar models for undirected hypergraphs are used e.g. in [51]. In a directed hypergraph, every hyperedge is defined as the pair (e − , e + ) consisting of the multisets e − and e + . The construction of [50] thus naturally generalizes to directed hypergraphs specified by picking e with probability p |e − |,|e + | . In the context of chemistry this amounts to picking educt and product sets for reactions with probabilities depending on their cardinality. This type of random (directed) hypergraph models are the obvious generalizations of the Erdős Renyí (di)graphs. A certain class of random directed hypergraphs with |e − | = 2 and |e + | = 1 for all hyperedges e is considered in [52].
Hypergraphs are also amenable to rewiring procedures that ensures the preservation of certain local or global properties. For instance [17] proposes a scheme that preserves the number and cardinality of the hyperedges (replacing a randomly selected (e − , e + ) with a randomly selected pair of disjoint subsets (e ′ − , e ′ + ) with |e − | = |e ′ − | and |e + | = |e ′ + | ). On this basis, the authors conclude that the hierarchical structure hypothesis proposed in [43] is not supported for metabolic networks when a clustering coefficient is defined for directed hypergraphs. [17] also compares S-and R-graphs of metabolic networks with ensembles of S-and R-graphs derived from randomized directed hypergraphs and cast further doubt on previously reported scaling results. Randomization procedures for hypergraphs that preserve local clustering are described in [53]. An approach that uses a chemical graph rewriting model to ensure soundness of reactions is described in the MSc thesis [54].
In [25] networks are constructed in a stepwise procedure starting with directed graphs whose arcs are then re-interpreted as directed hyperarcs by combining multiple arcs. This process is guided by matching the degree distribution of the implied S-graph.

Reaction universes: random subhypergraphs
Instead of generating a random RN directly from a statistical model or rewiring a given one, one can also start from a reaction universe RU, that is, a RN that contains all species of interest and all known or inferred reactions between them. Without losing generality we can think of the RU as a directed hypergraph in the sense of [50], where the multi-set formalism accounts for the stoichiometric coefficients. In contrast to the generative and rewiring approaches the a priori specification of an RU ensures a high level of chemical realism and RNs can now be sampled by randomly selecting subsets of directed hyperedges, that is, chemical reactions. If the RU already ensures conservation of matter or energy, these properties are inherited by the sub-networks. In order to generate random metabolic networks, reactions can be drawn from databases such as KEGG or EcoCyc [55,56]. Such selections of reactions are sometimes called "metabolic genotypes" since the available reactions are associated with enzymes, whose presence or absence is determined by an organism's genome [55]. In some studies, additional constraints such as the production of biomass are exploited and networks are sampled e.g. by combining Flux-Balance Analysis (FBA) and a Markov Chain Monte Carlo (MCMC) approach [55,57].

A characterization of chemistry-like reaction networks
In this section, we start from reaction networks that are specified as abstract stoichiometric relations, Eq. (1), and identify minimal constraints necessary to avoid blatantly unphysical behavior.

Notation and peliminaries
Let X be a finite set and let R be a pair of formal sums of elements of X with non-negative integer coefficients according to Eq. (1). Then we call the pair (X, R) a reaction network (RN). Equivalently, a RN is a directed, integer-weighted hypergraph with directed edges (r − , r + ) such that x ∈ r − with weight s − xr > 0 and x ∈ r + with weight s + xr > 0 . The weights s − xr and s + xr are usually called the stoichiometric coefficients. We set s − xr = 0 and s + xr = 0 if x / ∈ r − and x / ∈ r − , respectively. We deliberately dropped the qualifier chemical here since, as we shall see, not every RN (X, R) makes sense as a model of a chemical system. In fact, the aim of this contribution is to characterize the set of RNs that make sense as models of chemistry.
Such directed hypergraphs are most conveniently drawn as (bipartite) König multigraphs, with distinct types of vertices representing compounds x ∈ X and reactions r ∈ R , respectively. Stoichiometric coefficients larger than one appear as multiple edges. See the example in Fig. 1.
For each reaction r ∈ R , we define its support as supp (r) = {x | s − xr + s + xr > 0} ; that is, x ∈ supp (r) if it appears as an educt, a product, or a catalyst in r. The stoichiometric matrix of (X, R) is S ∈ N X×R 0 with entries S xr = s + xr − s − xr . We distinguish proper reactions r, for which there is both x ∈ X with S xr < 0 and y ∈ X with S yr > 0 , import reactions for which S xr ≥ 0 for all x ∈ X , and export reactions for which S xr ≤ 0 for all x ∈ X . We write ∅ for the empty formula, hence ∅ −→ A and B −→ ∅ designate the import of A and the export of B, respectively. Note that this definition also allows catalyzed import and export reactions, e.g., C −→ C + A or B + C −→ C.
In thermodynamics, a system is closed if it does not exchange matter with its environment, but may exchange energy in the form of work or heat [60]. For a RN, this rules out import and export reactions.
Given an arbitrary RN (X, R) , there is a unique inclusionmaximal closed RN contained in (X, R) , namely (X, R p ) with We will refer to (X, R p ) as the proper part of (X, R).
(2) R p = {r ∈ R | r is proper}. Fig. 1 Representation of a RN as König multigraph of the corresponding directed hypergraph. Round vertices (with chemical structures shown inside) designate compounds x ∈ X , while reactions r ∈ R are shown as square vertices. Stoichometric coefficients are indicated by the number of edges from x to r for s − xr > 0 and r to x for s + xr > 0 , respectively. A flow (an overall reaction) is given by non-negative integer multiples of individual reactions. Here the coefficients v r are indicated in the square nodes for each reaction r. The flow shown here defines Oró's [58] route from HCN to adenine (marked by red triangles) and corresponds to the net reaction 5HCN −→ H5C5N5 . Figure adapted from [59] For every reaction r, one can define a reverse reaction r that is obtained from r by exchanging the role of products and educts. That is, r is the reverse of r iff, for all x ∈ X , it holds that While thermodynamics dictates that every reaction is reversible in principle (albeit possibly with an extremely low reaction rate), it is a matter of modeling whether sufficiently slow reactions are included in the reaction set R.
Chemical reactions can be composed and aggregated into "overall reactions". In the literature on metabolic networks, pathways are of this form. An overall reaction consists of multiple reactions that collectively convert a set of educts into a set of products. It can be represented as a formal sum of reactions r∈R v r r , where the vector of multiplicities v ∈ N R 0 has non-negative integer entries. Thereby, [Sv] x determines the net consumption or production of compound x in the overall reaction specified by v.
A vector v ∈ N R 0 can be interpreted as an integer hyperflow in the following sense: If x is neither an educt nor a product of the overall reaction specified by v , then , every unit of x that is produced by some reaction r with v r > 0 is consumed by another reaction r ′ with v r ′ > 0.
The effect of an overall reaction can be represented via formal sums of species in two ways: as composite reactions, or as net reactions, Here we use the notation [c] + = c if c > 0 and [c] + = 0 for c ≤ 0 . In Eq. (5), intermediates, i.e., formal catalysts are cancelled. Hence, the net consumption (or production) of a species Fig. 1 shows the RN of Oro's prebiotic adenine synthesis from HCN and the integer hyperflow v corresponding to the net reaction "5 HCN −→ adenine" as an example.
While a restriction to integer hyperflows v ∈ N R 0 is necessary in many applications, see e.g. [21] for a detailed discussion, it appears mathematically more convenient to use the more general setting of fluxes v ∈ R R ≥ as in the analysis of metabolic pathways. To emphasize the connection with the body of literature on network (hyper) flows we will uniformly speak of flows.
For any vector v ∈ R R , we write v ≥ 0 if v is non-negative, v > 0 if v is non-negative and non-zero, that is, at least one entry is positive, and v ≫ 0 if all entries of v are positive. Analogously, we write A non-trivial flow satisfies v > 0 , i.e., v = 0 . Two flows v 1 and v 2 are called parallel if they describe the same net reaction. In particular, we therefore have Sv 1 = Sv 2 for parallel flows.
Futile cycles in a RN are non-trivial flows for which educts and products coincide and thus the net reaction is empty.
We use the term futile cycle in the strict sense to describe the concurrent activity of multiple reactions (or pathways) having no net effect other than the dissipation of energy. In the literature on metabolic networks often a less restrictive concept is used that allows certain compounds (usually co-factors, ATP/ADP, redox equivalents, or solvents) to differ between products and educts, see e.g. [61][62][63][64]. In this setting, the net reaction of concurrent glycolysis and gluconeogenesis, namely the hydrolysis of ATP, is viewed as energy dissipation rather than a chemical reaction. In our setting, ATP + H 2 O −→ ADP + P − i + H + , is a net reaction like any other, and hence a futile cycle would only arise if recycling of ATP, i.e., ADP + P − i + H + −→ ATP + H 2 O , was included as well.
If a RN has a futile cycle, it also has an integer futile cycle v ∈ N R 0 , since S has integer entries and thus its kernel has a rational basis, which can be scaled with the least common denominator to have integer entries.
Chemical reactions are subject to thermodynamic constraints that are a direct consequence of the conservation of energy, the conservation of mass, and the reversibility of chemical reactions. In the context of chemistry, conservation of mass is of course a consequence of the conservation of atoms throughout a chemical reaction. In the following sections, we investigate how these physical principles constrain RNs. Since we have introduced RNs in terms of abstract molecules and reactions, Eq. (1), we express the necessary conditions in terms of the stoichiometric matrix S , which fully captures only the proper part of the RN. Throughout this work, therefore, we assume that (X, R) is a closed RN, unless explicitly stated otherwise.

Thermodynamic constraints Reaction energies and perpetuum mobiles
Every chemical reaction r is associated with a change in the Gibbs free energy of educts and products. We therefore introduce a vector of reaction (Gibbs free) energies g ∈ R R and write (X, R, g) for a RN endowed with reaction energies. The reaction energy for an overall reaction is the total energy of the individual reactions involved. In terms of v ∈ R R , it can be expressed as where �·, ·� denotes the scalar product on R R .
Futile cycles may act as a chemical version of a perpetuum mobile. This is the case whenever a flow v > 0 with zero formal net reaction, Sv = 0 , increases or decreases energy, i.e., if �g, v� � = 0.
The classical concept of a perpetuum mobile decreases its energy, g, v < 0 , thereby "creating" energy for its environment. An "anti" perpetuum mobile with g, v > 0 would "annihilate" energy. Either situation violates energy conservation and thus cannot be allowed in a chemical RN. Obviously, there is no perpetuum mobile if (X, R) does not admit a futile cycle.
In fact, thermodynamics dictates that Gibbs free energy is a state function. Two parallel flows v 1 and v 2 therefore must have the same associated net reaction energies. That is, Let (X, R, g) be thermodynamic, (X ′ , R ′ ) be a subnetwork of (X, R) , and g ′ be the restriction of g to R ′ . Then v ′ ∈ R R ′ corresponds to v ∈ R R with supp (v) ⊆ R ′ , and thus v ′ ∈ R R ′ and S ′ v ′ = 0 imply Sv = 0 and further �g ′ , v ′ � = �g, v� = 0 . Hence (X ′ , R ′ , g ′ ) is again thermodynamic.
We note that the reaction energies of a reaction r and its reverse r necessarily cancel:

Lemma 5
If r and r are reverse reactions in a thermodynamic network (X, R, g) , then g r = −g r .
Proof If r and r are reverse reactions, then v with v r = v r = 1 (and v r ′ = 0 otherwise) satisfies Sv = 0 . Thus �g, v� = g r + g r = 0 .

Digression: molecular energies and Hess' Law
Every molecular species x ∈ X has an associated Gibbs free energy of formation. For notational simplicity, we write G x instead of the commonly used symbol G f (x) . The corresponding vector of molecular energies is denoted by G ∈ R X . Molecular energies and reactions energies g ∈ R R are related by Hess' law: For every reaction r ∈ R , it holds that In matrix form, the relationship between reaction energies g and molecular energies G amounts to Proposition 6 Let (X, R) be a RN and g ∈ R R be a vector of reaction energies. Then (X, R, g) is thermodynamic if and only if there is a vector of molecular energies G ∈ R X satisfying Hess' law, Eq. (7).
Proof By Definition 4, (X, R, g) is thermodynamic if g ∈ (ker S) ⊥ = im S ⊤ , that is, if there is G such that g = S ⊤ G , satisfying Hess's law.
Note that the vector of molecular energies G is not uniquely determined by g in general.

Reversible and irreversible networks
To begin with, we consider purely reversible or irreversible RNs.
In reversible networks, general vectors v ∈ R R have corresponding flows ṽ ≥ 0 with the same net reactions and, in the case of thermodynamic networks, with the same energies.

Lemma 8
Let (X, R, g) be a reversible RN (with reaction energies), and let v ∈ R R be a vector. Then there is a flow Since (X, R) is reversible, each reaction r ∈ R has a reverse r , and we define the reverse flow v 2 > 0 such that v 2 r = v 2 r . By construction, it satisfies Sv 2 = −Sv 2 . Now consider the flow ṽ = v 1 +v 2 > 0 . It satisfies If the network is thermodynamic, then the reverse flow satisfies �g,v 2 � = −�g, v 2 � , by Lemma 5. Hence, By definition, a thermodynamic network cannot contain a perpetuum mobile. Conversely, by the result below, if a reversible network is not thermodynamic, then it contains a perpetuum mobile.

Proposition 9
Let (X, R, g) be a reversible RN with reaction energies. Then, the following two statements are equivalent: Proof Suppose (X, R, g) is not thermodynamic. That is, there is v ∈ R R with Sv = 0 and �v, g� � = 0 . By Lemma 8, there is ṽ ≥ 0 with Sṽ = 0 and �ṽ, g� � = 0 , that is, a perpetuum mobile.
The exclusion of a perpetuum mobile is not sufficient in non-reversible systems.
Example 10 Consider the following RN (with reaction energies g): It contains one futile cycle, (8) but no perpetuum mobile. However, it contains two parallel flows with different energies, Hence, the RN (with reaction energies g ) is not thermodynamic. By setting g 3 = −2 , it can be made thermodynamic.
Many RN models are non-reversible, i.e., they contain irreversible reactions whose reverse reactions are so slow that they are neglected. From a thermodynamic perspective, irreversible reactions r must be exergonic, i.e., g r < 0 . We first consider the extreme case that all reactions r ∈ R are irreversible.

Proposition 11
Let (X, R, g) be an irreversible RN with reaction energies. Then, every futile cycle is a perpetuum mobile. Hence, if (X, R, g) is thermodynamic, then there are no futile cycles.
Proof Consider a futile cycle, that is, a flow v > 0 with Sv = 0 . Since all reactions are exergonic, v r > 0 implies g r < 0 and further g, v < 0 , that is, v is a perpetuum mobile. Now, if there is a futile cycle and hence a perpetuum mobile, then the network is not thermodynamic.

Thermodynamic soundness
We next ask whether a RN (X, R) can always be endowed with a vector of reaction energies g such that (X, R, g) is thermodynamic.
We note that thermodynamic soundness is a hereditary property of RNs, since we have seen above that if (X, R, g) is a thermodynamic network so are all its subnetworks (X ′ , R ′ , g ′ ).
Again, we first consider purely reversible or irreversible RNs.

Theorem 14 An irreversible RN is thermodynamically sound if and only if there are no futile cycles.
Proof By Gordan's Theorem (which is in turn a special case of Minty's Lemma [65], see Appendix B in [66]): Either there is a negative g ∈ (ker S) ⊥ or there is a nonzero, non-positive v ∈ ker S . That is, either there is It is not always obvious from the specification of an artificial chemistry model whether or not it is thermodynamically sound. As an example, we consider the artificial chemistry proposed in [67]. It considers only binary reactions (two educts) that produce two products, aiming to ensure conservation of particle numbers. In one variant, the networks only contains irreversible and thus exergonic reactions. It may produce, for instance, the following set of reactions: Their sum corresponds to the flow v = (1, 1, 1, 1) ⊤ ≥ 0 and yields the exergonic composite reaction that is, Sv = 0 . Thus the model admits a futile cycle composed entirely of exergonic reactions and hence a perpetuum mobile. Thus it is not thermodynamically sound.

Mixed networks
In many applications, RNs contain both reversible and irreversible reactions, . There are two interpretations of such models: (a) In the (lax) sense used above, reversible reactions can be associated with arbitrary energies, while irreversible reactions are considered exergonic. That is, the reaction energies must satisfy g r < 0 for r ∈ R irr . (b) In a strict sense, the reaction energies assigned to irreversible reactions are much more negative than the reaction energies of the reversible ones. After scaling, one requires |g r | ≤ 1 (that is, −1 ≤ g r ≤ 1 ) for r ∈ R rev and |g r | ≥ γ (that is, g r ≤ −γ ) for r ∈ R irr and (large) γ > 1 . The intuition is that reactions r with g r ≥ γ can be neglected.
The following example shows that thermodynamic soundness differs in the lax and strict senses.
Example 15 Consider the following RN (with reaction energies g): for some g > 0 . It contains two futile cycles: By setting g = 1/2 , the RN can be made thermodynamic.
(Then the second futile cycle is not a perpetuum mobile.) However, the RN in (10) cannot be seen as the limit of a thermodynamic, reversible network (A ↔ B ↔ C ↔ A) for large g. Thereby, one considers small g 1 , g 1 and large negative g 2 , g 3 (and hence large positive g 2 , g 3 , that is, negligible reverse reactions 2, 3 ). Any such (limit of a) reversible RN contains a perpetuum mobile (the second futile cycle); equivalently, it is not thermodynamic.

Definition 16
A mixed network is thermodynamically sound if there are reaction energies g such that (X, R, g) is thermodynamic and g r < 0 for r ∈ R irr . is strictly thermodynamically sound if, for all γ > 1 , there are reaction energies g such that (X, R, g) is thermodynamic, |g r | ≤ 1 for r ∈ R rev , and The scaling condition can also be written in the form A more detailed justification for strict thermodynamic soundness in mixed networks will be given in the next subsection when considering open RNs. Here, we focus on the relationship of thermodynamic soundness and futile cycles.

Theorem 17 A mixed RN is thermodynamically sound if and only if there is no irreversible futile cycle.
Proof By a "sign vector version" of Minty's Lemma: Either there is g ∈ (ker S) ⊥ with g r < 0 for r ∈ R irr (the network is thermodynamically sound) or there is a non-

Theorem 18 A mixed RN
is strictly thermodynamically sound if and only if no futile cycle contains an irreversible reaction.
Thereby, a sum of intervals is defined in the obvious way, yielding an interval which is positive if each of its elements is positive. Via v → −v , the interval condition (12) is equivalent to: there is v ∈ ker S with As necessary conditions, we find (i) v r * > 0 for some r * ∈ R irr and (ii) v r ≥ 0 for all r ∈ R irr . By Lemma 8,(iii) there is an equivalent flow with v r ≥ 0 for r ∈ R rev . That is, there is a futile cycle v > 0 involving an irreversible reaction. For γ large enough, the necessary conditions are also sufficient for the interval condition (13).
We may characterize strict thermodynamic soundness for mixed networks also in geometric terms:

Corollary 19 Let
, L rev = im S rev , and C irr = cone S irr . Then, (X, R) is strictly thermodynamically sound if and only if it is thermodynamically sound and L rev ∩ C irr = {0}.

Open (mixed) networks
Opening the RN, i.e., adding transport reactions alters the representation of reaction energies. We now have to consider chemical potentials involving concentrations, i.e., we replace the (Gibbs free) ener- is the activity of x, which approximately coincides with the concentrations. A reaction r then proceeds in the forward direction whenenver the chemical potential of the products is smaller than the chemical potential of the educts, i.e., if This condition can be rewritten in terms of the reaction (Gibbs free) energies and (the logarithm of ) the "reaction quotient", see e.g. [68]: (Note that this futile cycle is not an actual cycle of the graph.) As a result, the network is thermodynamically sound, but not strictly thermodynamically sound. In a metabolically relevant example from glycolysis/gluconeogenesis, the compounds are S = fructose-6-phosphate, P = fructose-1,6-bisphosphate, E = phosphofructokinase 1, and F = fructose-1,6-bisphosphatase, and reactions 2 and 4 involve additional compounds (bottom). As a consequence, there is no non-trivial futile cycle (in the strict sense of this work). In fact, the vector v above then represents the net reaction ATP + H 2 O → ADP + P i . Still, it is called a futile cycle or substrate cycle in the literature on metabolic networks. (In our approach, reactions producing/consuming the additional compounds P i must be added to the network to obtain a futile cycle. Such a futile cycle involves the active reactions in v , and hence the extended network cannot be strictly thermodynamically sound.) The activities [x] for x ∈ X therefore define an upper bound on the reaction energy g r . In an open system, (internal) concentrations may be buffered as fixed values or are implictly determined by given influxes or external concentrations [69]. Given a specification of the environment, i.e., of the transport fluxes and/or buffered concentrations, the upper bound in Eq. (15) can have an arbitrary value. Thus, if an irreversible reaction in R is meant to proceed forward for all conditions, it must be possible to choose g r < 0 arbitrarily small, i.e., |g r | arbitrarily large. This amounts to requiring that (X, R p ) is strictly thermodynamically sound. In many studies of reaction networks, one requires that a reaction proceeds forward in a given situation, but allows that it proceeds backward in other situations. In this (lax) interpretation of irreversibility it is sufficient to require that (X, R p ) is thermodynamically sound, but not necessarily strictly thermodynamically sound. In Def. 16, we introduce (strict) thermodynamical soundness in terms of reaction energies, and in Thms. 17 and 18, we characterize it in terms of futile cycles. In a corresponding approach [70,71], "extended" detailed balance is required for (closed) RNs with irreversible reactions at thermodynamic equilibrium. Thereby, activities [x], rate constants k + , k − and equilibrium constants K are explicitly used to formulate Wegscheider conditions for non-reversible RNs that are limits of reversible systems. The characterization of such systems in [70] is equivalent to our results.

Reversible completion
As models of chemistry, non-reversible networks are abstractions that are obtained from reversible thermodynamics networks by omitting the reverse of reactions that mostly flow into one direction.

Definition 20
Let (X, R, g) be a thermodynamic RN with . The reversible completion of (X, R, g) is the RN (X, R * , g * ) with and g * r = g r for and g * r = −g r for r ∈ R irr .
Lemma 21 If (X, R, g) is a thermodynamic RN, then its reversible completion (X, R * , g * ) is also a thermodynamic RN.
Proof Let r ∈ R * be the reverse reaction of r ∈ R irr . By Prop. 6, for every r ∈ R there is a vector G ∈ R X satisfying Hess' law. It suffices to show that G still satisfies Hess' law for (X, R * ) . By the definition of r , its reaction energy is g * r = x∈X G x (s + xr − s − xr ) = x∈X G x (s − xr − s + xr ) = −g r , as required by Def. 20. Thus (X, R * , g * ) is also thermodynamic.
The following result is an immediate consequence of Lemma 21.

Proposition 22
If the RN (X, R) is thermodynamically sound, then its reversible completion is also thermodynamically sound, and the reaction energies g can be choosen such that g r < 0 < g r for all r ∈ R irr .

Mass conservation and cornucopias/abysses
Thermodynamic soundness is not sufficient to ensure chemical realism. As an example, consider the random kinetics model introduced in [72]. It assigns (a randomly chosen) energy G(x) to each x ∈ X . Each reaction r is defined by randomly picking a set of educts e − r and products e + r . A possible instance of this model comprises four compounds with molecular energies G(A) = −5 , G(B) = −5 , G(C) = −10 , and G(X) = −2 , and two reactions The first reaction is exergonic with g 1 = −2 and the second has reaction energy g 2 = 0 . The composite reaction, obtained as their sum, is A + B → A + B + X . Ignoring the effective catalysts A and B, the corresponding net reaction is ∅ → X . In this universe, therefore, it is possible to spontaneosly create mass in a sequence of exergonic reactions. Reverting the signs of the energies reverts the two reactions and thus yields an exergonic reaction that makes X disappear.
We can again describe this situation in terms of flows. Recall that [Sv] x is the net production or consumption of species x. The spontaneous creation or annihilation of mass thus corresponds to flows v > 0 with Sv > 0 or Sv < 0 , respectively. Systems with cornucopias or abysses cannot be considered as closed systems. The proper part of chemical reaction networks therefore must be free of cornucopias and abysses.
Since in a reversible network any vector v ∈ R R can be transformed into an equivalent flow ṽ ≥ 0 (with Sṽ = Sv ), cf. Lemma 8, we have the following characterization.

Proposition 24 A reversible RN is free of cornucopias and abysses if and only if there is no vector
In fact, mass conservation rules out cornucopias and abysses. More generally, a reaction invariant is a property that does not change over the course of a chemical reaction [8,27,29]. Here, we are only interested in linear reaction invariants, also called conservation laws [73], that is, quantitative properties of molecules (such as mass) whose sum is the same for educts and products.

Definition 25 A linear reaction invariant or conservation law is a non-zero vector m ∈ R X that satisfies
x∈X m x s + xr = x∈X m x s − xr for all reactions r ∈ R , that is, m ⊤ S = 0.

Definition 26
A RN is conservative if it has a positive conservation law, that is, if there is m ∈ R X such that m ≫ 0 and m ⊤ S = 0.
By definition, a conservative network is free of cornucopias and abysses. Conversely, by the result below, if a reversible network is not conservative, then it contains a cornucopia (and an abyss).

Theorem 27 A reversible RN (X, R) is free of cornucopias and abysses if and only if it is conservative.
Proof By Stiemke's Theorem (which is in turn a special case of Minty's Lemma): Either there is a nonzero, non-negative n ∈ im S or there is a positive m ∈ ( im S) ⊥ = ker S ⊤ . That is, either there is v ∈ R R with n = Sv > 0 (corresponding to a cornucopia ṽ > 0 ) or there is m ≫ 0 with S ⊤ m = 0 (as claimed).
We therefore conclude that every closed chemical RN must have a positive reaction invariant. This is no longer true if the RN is embedded in an open system and mass exchange with the environment is allowed. By construction, each transport reaction violates at least one of the conservation laws of the closed system, since [m ⊤ S] r > 0 if r is an import reaction and [m ⊤ S] r < 0 if it is an export reaction. As discussed e.g. in [73], opening a RN by adding import or export reactions, can only reduce the number of conservation laws and cannot introduce additional constraints. Nevertheless, a RN must be chemically meaningful when the import and export reactions are turned off. That is, its proper part (X, R p ) must be conservative to ensure that it has a chemical realization.

Conservation of atoms and moieties
Molecules are composed of atoms, which are -by definition -preserved in every chemical reaction. For each atom type a, there is a conservation law that accounts for the number of atoms of type a in each compound x. More precisely, denote by A ax ∈ N 0 the number of atoms of type a in molecule x, i.e., the coefficients in the chemical sum formula a A ax a for compound x. (Alternatively, we may think of sum formulas as multisets of atoms.) Conservation of atom a in reaction r therefore becomes For all atoms and reactions and in matrix form, this condition reads AS = 0 . Each row of the matrix A thus is a non-negative linear reaction invariant, i.e., a non-negative conservation law.
Conserved moieties are groups of atoms that remain intact in all reactions in which they appear [26,28,30]. Like atoms, they lead to non-negative integer conservation laws.
"Semi-positive" conservation laws [26,74] of a RN are the non-zero elements of the polyhedral cone the non-negative left-kernel of S . Thereby, K (S) is an s-cone as defined in [75], given by a subspace (here: ker S ⊤ ) and non-negativity conditions. Since the s-cone K (S) is contained in the non-negative orthant, its extreme (non-decomposable) vectors agree with its support-minimal vectors. Further, since S is an integer matrix, all extreme vectors of K (S) are positive real multiples of integer vectors.
All potential moiety conservation laws (MCLs) [76] for a given stoichiometric matrix S (but unknown atomic composition) are non-zero, integer elements of K (S) , i.e., elements of the set Clearly, K(S) contains the integer extreme vectors of K (S) . Ultimately, one is interested in minimal MCLs, i.e., minimal elements of K(S) , cf. [77]. (Minimal vectors are called maximal in [74].) In fact, integer minimality and integer non-decomposability are equivalent.
Most importantly, the minimal MCLs generate all MCLs.

Theorem 30 Every element of K(S) is a finite integer linear combination of minimal elements of K(S).
Proof By Noetherian induction on the partial order < on N X 0 and Proposition 29.
Knowing all minimal MCLs allows to represent the compounds X of a RN (X, R) in a minimal (most coarsegrained) way.
generate ker S ⊤ , that is, im M ⊤ = ker S ⊤ . Hence, the number of minimal MCLs is greater equal d, that is, |M| ≥ dim ker S ⊤ .
By instantiating the abstract moieties {X, Y, Z} with sum formulas (multisets of atoms), every chemical realization of the reaction can be obtained. In general, we define an instance as follows.
Definition 33 A sum formula instance (short: sfinstance) of a RN (X, R) with stoichiometric matrix S is a matrix A ∈ N A×X 0 for some non-empty, finite set A of "atoms" such that (i) each column of A is non-zero, and (ii) AS = 0.
Def. 33 in particular allows that A comprises a single row. By condition (i), this row vector is a strictly positive conservation law, which, as a linear combination of MCLs, may be chosen to be integer valued. Conversely, if (X, R) admits an sf-instance, then the column-sum m = 1 ⊤ A ∈ ker S ⊤ is a strictly positive integer conservation law and thus in particular an sf-instance with |A| = 1 . Taken together, we have shown the following existence result.

Proposition 34 A RN (X, R) admits an sf-instance if and only if it is conservative.
The entry m x of m can be interpreted as the total number of atoms in compound x ∈ X . In [78], a RN is called primitive atomic if each reaction preserves the total number of atoms. Thus a RN is primitive atomic if and only if it is conservative, cf. [78].

Isomers and sum formula realizations
In order to gain a better understanding of sf-instances for a RN (X, R) , we consider net reactions of the form X → Y in the reversible completion of (X, R) . That is, we ask whether it is possible, in principle, to convert X into Y, irrespective of whether the conversion is thermodynamically favorable. From a chemical perspective, if such a net isomerization reaction exists, then X and Y must be compositional isomers. These will play a key role in our discussion of realizations of (X, R) in terms of sf-instances.
Before we proceed, we first give a more formal account of net isomerization reactions. Recall that a net reaction derives from an overall reaction, which in turn is specified by an integer hyperflow. Instead of working explicitly in the reversible completion, we may instead consider vectors v ∈ Z R with negative entries v r < 0 , representing the reverse of irreversible reactions r ∈ R.

Definition 35
Let (X, R) be a RN with stoichiometric matrix S . A vector v ∈ Z R , satisfying k:= − [Sv] x = [Sv] y ∈ N for some x, y ∈ X and [Sv] z = 0 for all z ∈ X \ {x, y} , specifies a net isomerization reaction k x → k y . Two (distinct) compounds x, y ∈ X are obligatory isomers if (X, R) admits a net isomerization reaction k x → k y . We write x ⇋ y if x = y or x and y are obligatory isomers.

Proposition 36
The binary relation x ⇋ y introduced in Def. 35 is an equivalence relation.
Proof By definition, ⇋ is reflexive. If v specifies the net isomerization reaction k x → k y , then −v specifies k y → k x , and thus ⇋ is symmetric. To verify transitivity, suppose x ⇋ y and y ⇋ z , i.e., there are vectors v 1 and v 2 that specify the net isomerization reactions p x → p y and q y → q z . Then v = qv 1 + pv 2 satisfies [Sv] x = −pq , [Sv] z = pq , [Sv] y = 0 , and [Sv] u = 0 for all u ∈ X \ {x, y, z} , and thus specifies the net isomerization reaction (pq) x → (pq) z . Thus, ⇋ is transitive.
The intuition is to define a sum formula realization of a RN as a matrix A that (i) is an sf-instance of the RN and (ii) assigns different atomic compositions to x and y whenever x ⇋ y , that is, whenever x and y are not isomers. In the following, we will see that such a definition both ensures chemical realism and leads to a useful mathematical description. The next result relates net isomerization reactions to the structure of ker S ⊤ (and ultimately to compositional isomers as given by MCLs and sf-instances). Proof First suppose x ⇋ y . Then either x = y (in which case the assertion is trivially true) or there is a net isomerization reaction k x → k y specified by the vector v . Let m ∈ ker S ⊤ . By the definition of v , we have Now suppose m x = m y for all m ∈ ker S ⊤ and consider the vector w ∈ Z X with w x = −1 , w y = 1 , and w z = 0 for all z ∈ X \ {x, y} . Clearly, �m, w� = 0 for all m ∈ ker S ⊤ , that is, w ∈ (ker S ⊤ ) ⊥ = im S . Thus there is v ∈ R R such that w = Sv . Since S ∈ Z X×R , the solution v of this linear equation is rational. Writing lcd (v) for the least common denomintor of the entries in v , we obtain the integer vector lcd (v) v ∈ Z R , specifying the net isometrization reac- The proof of Thm. 37 also provides a simple algorithm to compute integer hyperflows v that specify net isomerization reactions and to identify the obligatory isomers: For each pair x, y ∈ X , construct w with w x = −1 and w y = 1 being the only non-zero entries and solve the linear equation Sv = w . We have x ⇋ y if and only if a solution exists, in which case the desired integer hyperflow is lcd (v) v.
We next show that obligatory isomers cannot be distinguished by sf-instances, and conversely, compounds that are not obligatory isomers are distinguished by certain sf-instances.

Theorem 38
Let (X, R) be a RN with stoichiometric matrix S and A ∈ N A×X 0 be an sf-instance. If im A ⊤ = ker S ⊤ , then the following statements are equivalent: (i) x, y ∈ X are obligatory isomers; (ii) A ax = A ay for all a ∈ A.
Proof Let x, y ∈ X be distinct. On the one hand, by Theorem 37, statement (i) is equivalent to m x = m y for all m ∈ ker S ⊤ . On the other hand, statement (ii) is equivalent to m x = m y for all m ∈ im A ⊤ . If im A ⊤ = ker S ⊤ , then (i) and (ii) are equivalent. If im A ⊤ ⊆ ker S ⊤ , that is, if the rows of A are elements of ker S ⊤ , then (i) implies (ii).
Any sf-instance A whose rows span ker S ⊤ not only identifies obligatory isomers, but also assigns distinct sum formulas to any distinct compounds x, y ∈ X that are not obligatory isomers. In this case, there is at least one row (corresponding to atom a) for which A ax = A ay . This provides the formal justification for a mathematical definition of sum formula realizations.

Definition 39
A sum formula realization (short: sf-realization) of a RN (X, R) with stoichiometric matrix S is a matrix A ∈ N A×X 0 for some non-empty, finite set A of "atoms" such that (i) each column of A is non-zero and (ii) im A ⊤ = ker S ⊤ .
As an illustration, consider the RN depicted on the left side of Fig. 3. The RN can be instantiated by the sum formulas U = A , V = B , W = C , X = AB , Y = AC , Z = ABC . The corresponding matrix A (middle right in Fig. 3) is not only an sf-instance, its rows also span ker S ⊤ , and hence it is an sf-realization. (In fact, it is also the mm-representation.) A "reduced representation" can be obtained by assuming that U, V, and W are compositional isomers corresponding to the same moiety D, that is, U = V = W = D . As a consequence, X and Y are also compositional isomers, X = Y = D 2 , and further Z = D 3 . The corresponding matrix A ′ still defines an sf-instance, but its rows do not span ker S ⊤ . Now consider an extension of the RN in Eq. (26), by adding three isometrization reactions, In the extended network given by Eq. (26) and Eq. (27), we have dim ker S ⊤ = 1 , and thus there is a unique MCL. The reactions in Eq. (27) now enforce that U, V, and W are compositional isomers and thus correspond to the same moiety D. This coincides with the "reduced representation" A ′ for the RN in Eq. (26). The distinction is that, for the RN of Eq. (26), we may (but do not have to) assume that U, V, and W are isomers, whereas in the extended network, no other interpretation is possible.
Finally, we characterize RNs that admit an sf-realization.

Proposition 40 A RN (X, R ) admits an sf-realization if and only if it is conservative.
Proof Suppose (X, R) admits an sf-realization, which, in particular, is an sf-instance. By Prop. 34, (X, R) is conservative. Conversely, suppose (X, R) is conservative. By definition, the mm-representation is an sf-instance, and by Lemma 32, it is an sf-realization.
Obligatory isomers put some restriction on sfinstances. Still, there is surprising freedom for sf-realizations. We say that two sf-realizations A and A ′ are equivalent, A ∼ A ′ , if there are integers p, q ∈ N such that pA = qA ′ . One easily checks that ∼ is an equivalence relation. If dim ker S ⊤ = 1 , then all m ∈ dim ker S ⊤ are multiples of the unique minimal MCL. All sum formulas are then of the form D k , and thus we can think of compounds simply as integers k ∈ N . Every reaction thus can be written in the form k s − kr D k → k s + kr D k with k (s + kr − s − kr )k = 0 . An example of practical interest is the rearrangement chemistry of carbohydrates, found in metabolic networks such as the pentose phosphate pathway (PPP) or the non-oxidative part of the Calvin-Benson-Bassham (CBB) cycle in the dark phase of photosynthesis. Carbohydrates may be seen as a "polymers" of formaldehyd units and can therefore be written as D k = (CH 2 O) k . The PPP interconverts pentoses (e.g. ribose) and hexoses (such as glucose), in an atomeconomic (no waste) rearrangement network possessing the overall reaction 6(CH 2 O) 5 ⇐⇒ 5(CH 2 O) 6 . In a similar fashion five 3-phosphoglycerates are reconfigured via carbohydrate chemistry into three ribulose-5-phosphate which results in the overall reaction of 5(CH 2 O) 3 → 3(CH 2 O) 5 if focusing on the sugar component. Carbohydrate reaction chemistry is particularly well-suited for the implementation of isomerization networks, and the logic and structure of the design space of alternative networks implementing the same overall reaction has been explored using mathematical and computational models [21,80]. Fig. 4 shows the RN of the prebiotic carbohydrate formation according to [79]. The analysis of the corresponding stoichiometric matrix, available as Additional file 1, shows that all Cn compounds are obligatory isomers. Furthermore, their sum formulas are necessarily multiples of the C1 unit, which corresponds to formaldehyd in the formose reaction.
For dim ker S ⊤ > 1 , there is an infinite set of sf-realizations that are pairwisely inequivalent. To see this, construct matrices A t = (t 1 y 1 , t 2 y 2 , . . . , t k y k ) ⊤ from k = dim ker S ⊤ > 1 linearly independent (minimal) MCLs y i and with t ∈ N k . Clearly, every such matrix A t is an sfrealization. Furthermore, Clearly, there is an infinite set T ⊆ N k of integer vectors such that this inequality is satisfied for all distinct t, t ′ ∈ T . For instance, one may choose distinct primes for all entries of t ∈ T . Thus there are infinitely many pairwisely inequivalent sf-realizations. Furthermore, the choice of the (minimal) MCLs is not unique, in general, allowing additional freedom for sf-realizations. Finally, one may produce more  1, 0, 1, 0, 0, 1, 0) is represented by the composite reaction X + (U + W) + V → (U + V) + Y + W and specifies the net isometrization reaction X → Y complex sf-realizations by appending additional rows to A that are linear combinations of the basis vectors. Therefore we have the following result.

Proposition 41
Let (X, R) be a conservative RN with stoichiometric matrix S. If dim ker S ⊤ > 1, then there are infinitely many in-equivalent sf-realizations of (X, R).

Structural formula realizations
A structural formula represents a chemical species as a (connected) molecular graph, whose vertices are labeled by atom types and edges refer to chemical bonds. Lewis structures [81] are equivalent to vertex-labeled multigraphs in which each bonding electron pair is represented as an individual edge, and each non-bonding electron pair as a loop. In particular, double or triple Fig. 4 Reaction network of the formose reaction describing pre-biotic carbohydrate formation [79]. The RN is drawn here in a simplified form showing aldol and retro-aldol reactions (those with 1 educt and 2 products, and vice versa) without their reverse reactions. The stoichiometric matrix of the full network comprising all 38 reaction connecting the 29 compounds is provided as Addition file 1. Compounds are labeled by the number of carbon atoms. C1a (in the center) designates formaldehyd, C3c is dihydroxy acetone. The network was drawn an analyzed with MØD [21]. All compounds with the same number of carbons are obligatory isomers. Moreover, all sum formula representations are of the from A n , with A denoting the moiety corresponding to the formaldehyd unit bonds are shown as two or three parallel edges. The educt and product complexes r − and r + of a reaction r can then be represented as the disjoint unions of the educt and product graphs, respectively. A chemical reaction is a graph transformation that converts the educt graph into the product graph such that vertices and their labels are preserved [33,82]. Only the bonds are rearranged. Since electrons are conserved, and each edge or loop accounts for two electrons, any reaction must preserve the sum of vertex degrees and thus the number of edges. Fig. 5 shows an example.
This idea can be generalized to sf-realizations in which "atoms" are viewed as moieties. We may then interpret the vertices of a multigraph as "fragments" of species that are endowed with a certain number of "valencies" or "half bonds". These must be "saturated" by binding to free valencies of other moieties or they must be used to form internal bonds within a moiety. In graph theory, the degree of a vertex is simply the number of incident edges. In chemistry, a related notion is the valency of an atom, i.e., the number of bonds (counting bond order) that can be formed by an atom. Each type of atom/moiety therefore has a fixed degree that we can think of as the number of halfbonds. Each of these may bind to other moieties or form a "loop", i.e., match up with another halfbond of the same vertex. Correspondingly, the degree d(u) of a vertex u in a multigraph is defined as the number of edges that connect u with other vertices plus twice the number of loops. A reaction thus preserves electrons if and only if its only effect is to rearrange the bonds in the multigraph. The valency val(a) of an atom of type a is most naturally interpreted as the number of electrons in the outer shell. Loops then correspond to non-bonding electron pairs. This notion of valency matches Frankland's "atomicity" and conforms to the IUPAC terminology [83]. Much of the chemical literature, however, uses the term valency loosely for the number of bonds; it is then not an unambigous property of an element or atom and changes with the oxidation state.

Definition 42
Let A be a non-empty, finite set, val : A → N be an arbitrary function, and a∈A n a a be a sum formula. A multigraph Ŵ = (V , E, α) with loops and vertex coloring α : V → A is a corresponding structural formula if it satisfies the following conditions: (i) Each vertex u ∈ V corresponds to a moiety α(u) , in particular, |{u ∈ V : α(u) = a}| = n a . (ii) d(u) = val(α(u)) for all u ∈ V , i.e., the vertex degree of u is given by the corresponding moiety. (iii) Ŵ is connected.
The structural formulas specified in Def. 42 do not cover all Lewis structures. In particular, neither explicit charges nor unpaired electrons are covered. While these are important from a chemical perspective, we shall see below that such extensions are not needed for our purposes since the straightforward multigraphs in Def. 42 already provide sufficient freedom to obtain representations for all conservative RNs. Extensions to radicals and charges will be briefly considered in the Discussion section.

Definition 43
Let (X, R) be a RN, A be a non-empty, finite set, and val : A → N be an arbitrary function. A Lewis instance is an assignment of vertex-colored multigraphs Ŵ x = (V x , E x , α x ) to all x ∈ X such that (i) vertex degrees satisfy d(u) = val(α x (u)) , for all u ∈ V x and x ∈ X , and (ii) the corresponding matrix A ∈ N A×X 0 defined by A ax = |{u ∈ V x : α x (u) = a}| is an sf-instance.
Furthermore, x → Ŵ x is a Lewis realization if A is an sf-realization.
Clearly, every Lewis realization has a corresponding sfrealization. Given an sf-realization, we therefore ask when there is a corresponding Lewis realization. By Def. 42 and 43, we have the following result. Proof For the 'if' part, let x∈A A ax a be the sum formula for x ∈ X . By assumption, there exists a vertex-colored multigraph Ŵ x = (V x , E x , α x ) for x such that (i) vertex degrees The appeal of this characterization is that it does not use any properties of the RN (X, R) , at all. In fact, it is easy to see that such a representation always exists.

Lemma 45 Let
A be a nonempty, finite set and a∈A n a a be a sum formula. Then, there exists a corresponding structural formula with val(a) = 2 for all a ∈ A.
Proof If the sum formula is given by n a = 1 and n a ′ = 0 for all a ′ ∈ A \ {a} , i.e., if it is single moiety, then the corresponding structural formula is a single vertex with color a and a loop. Otherwise, arrange the |V | = a n a vertices, of which exactly n a are colored by a, in a cycle and connect the vertices along the cycle. Then every vertex u satisfies d(u) = val(α(u)) = 2 and the graph is connected.
The result extends to any constant function val(a) = 2k (with k ∈ N ) by adding k − 1 loops to each vertex. As an immediate consequence of Lem. 44 and 45, we have the following result.

Proposition 46 (X, R) has a Lewis realization if and only if it has an sf-realization.
Using Prop. 40, we can characterize RNs that admit a Lewis realiztion.

Proposition 47 A RN (X, R) admits a Lewis realization if and only if it is conservative.
Interestingly, the simple multigraphs in Def. 42 are sufficient to represent all conservative RNs and thus (the proper part of ) all chemical networks. Radicals and other chemical species whose structures cannot be expressed in terms of electron pairs therefore do not add to the universe of chemically realistic RNs. For more details, see the Discussion section.
Like an sf-realization, a Lewis realization does not necessarily assign distinct multigraphs Ŵ x and Ŵ y to distinct compounds x and y. In the case of sf-realizations, obligatory isomers must have the same sum formula. In Lewis realizations, however, they need not have the same multigraph.

Proposition 48
For every conservative RN (X, R) there exists an injective Lewis realization x → Ŵ x .
Proof Sf-representations can be constructed to have an arbitrary number of atoms or moieties for each x ∈ X , that is, the vertex sets V x of the corresponding multigraphs Ŵ x can be chosen arbitrarily large. Set val(a) = 4 for all a ∈ A and construct an initial Lewis representation of compounds as cycles, as in the proof of Lemma 44, but with an additional loop at each vertex. Consider two obligatory isomers x ⇋ y , and let the (adjacent) vertices u, v ∈ V x be connected (by a single edge). Now replace the two loops at the corresponding vertices u, v ∈ V y by two additional edges between u and v. If the equivalence class of obligatory isomers contains more than two compounds, choose sets of pairs of disjoint positions along the cycles and replace pairs of loops by double edges. This yields circular matchings, familiar e.g. from the theory of RNA secondary structures [85,86]. Setting n = |V x | − 5 , one can construct crossing-free circular Fig. 6 Construction of non-isomorphic multigraphs with valency 4 in the proof of Prop. 48. The first three isomers are a cycle (with loops), a cycle with a single triple-bond indicating an "origin", and a graph with an additional double bond. In the third graph, the asymmetric arrangement of the double and triple bonds implies an unambiguous ordering of the remaining vertices (numbered from 1 to n). Non-isomorphic graphs are obtained converting a pair of loops into a double bound. Since each vertex has at most one bond in addition to the cycle, the resulting graphs correspond to Kleitman's "irreducible diagrams" [84]. If crossings of bonds are excluded, the resulting induced subgraphs with vertex set {1, . . . , n} are isomorphic to RNA secondary structures on sequences of n monomers. The number S n of secondary structures grows asymptotically ∼ 2.6 n [85] matchings on n vertices, whose number grows faster than 2.6 n , see also Fig. 6. Thus, if V x is chosen large enough, an arbitrarily large set of obligatory isomers can be represented by non-isomorphic multigraphs. Note, finally, that the construction of non-isomorphic graphs does not depend on (the cardinality of ) the atom set A , and thus the construction is also applicable in the case |A| = 1 , i.e., dim ker S ⊤ = 1 .
The proof in particular shows that the number of vertices required to accommodate the obligatory isomers grows only logarithmically in the size of the equivalence classes of obligatory isomers.

Characterization of chemistry-like reaction networks
In this contribution, we have characterized reaction networks that are chemistry-like in the sense that they are consistent with the conservation of energy and mass and allow an interpretation as transformations of chemical molecules. It is worth noting that we arrive at our results without invoking mass-action kinetics, which has been the focus of interest in chemical reaction network theory since the 1970s [7][8][9]. Instead, we found that basic arguments from thermodynamics (without kinetic considerations) are sufficient. The main results of this contribution can be summarized as follows: (v) For every sf-realization of a RN (X, R) there is also a Lewis-realization, i.e., an assignment of multigraphs to each compound such that reactions are exclusively rearrangements of edges.
Such chemistry-like realizations, however, are by no means unique. In general, the same RN has infinitely many chemical realizations corresponding to different atomic compositions. The structure of the stoichiometric matrix S of a closed RN therefore implies surprisingly little about the underlying chemistry. Nevertheless there is interesting information that is independent of the concrete realization. For example, Thm. 37 can be reformulated as follows: The reversible completion of (X, R) admits a net reaction of the form p x −→ q y with x, y ∈ X and p, q ∈ N if and only if q m x = p m y for every m ∈ ker S ⊤ . This identifies "obligatory oligomers", necessarily composed of multiples of the same monomer.

Computational considerations
Somewhat surprisingly, the computational problems associated with recognizing "chemistry-like" RNs are not particularly difficult and can be solved by well-established methods. To see this, recall that (X, R) is conservative iff there is a vector m ≫ 0 such that S ⊤ m = 0 and not thermodynamically sound iff there is a vector v > 0 such that Sv = 0 and v r > 0 for some r ∈ R irr These linear programming problems can be solved in O((|X| + |R |) 2.37 ) time [87].
An integer (not necessarily non-negative) basis of ker S ⊤ can be computed exactly in polynomial time, e.g. using the Smith normal form, see [88]. Chubanov's algorithm finds exact rational solutions to systems of linear equations with a strict positivity constraint. Thus is can be employed to compute a strictly positive integer solution m ≫ 0 to S ⊤ m = 0 in polynomial time [89,90]. As a consequence, an sf-realization can also be computed explicitly in polynomial time. Each sum formula in turn can be converted into a graph with total effort bounded by max x∈X a A xa · |X| , the maximal number of atoms that appear in a sum formula times the number of molecules.
The equivalence relation ⇋ for obligatory isomers is determined by the existence of solutions to a linear equation of the form Sv = w and thus can also be computed in polynomial time, again bounded by the effort for matrix multiplication for each pair x, y ∈ X . A much more efficient approach, however, is to compute a basis of ker S ⊤ , from which ⇋ can be read off directly. This approach easily extends to "obligatory oligomers. " Treating RNs as closed systems is too restrictive to describe metabolic networks. There, RNs are considered as open systems that allow the inflow of nutrients and the outflow of waste products. Models of metabolism often impose a condition of viability. Traditionally, this is modeled as a single export "reaction" r bm of the form i α i C i → ∅ , known as the biomass function [91]. It comprises all relevant precursor metabolites C i (forming all relevant macromolecules) in their empirically determined proportions α i . Viability is then defined as the existence of a flow v > 0 with Sv = 0 and v bm > 0 . This linear programming problem can be tested efficiently by means of flux balance analysis (FBA) [92]. In contrast to (X, R) being conservative and thermodynamically sound, however, viability is a property of the metabolic model, not of the underlying representation of the chemistry.

Construction of random chemistry-like networks
The formal characterization of chemistry-like RNs developed here suggests several interesting questions for further research. In particular, our results define rather clearly how random chemistry-like RNs should be defined and thus poses the question whether there are efficient algorithms for their construction. Let us consider the task of generating a random chemistry-like RN in a bit more detail. We first note that it suffices to generate a stoichiometric matrix S ∈ N X×R 0 that is thermodynamically sound and conservative. If explicit catalysts are desired, they can be added to a reaction without further restrictions. More precisely, given S , we obtain a network with the same stoichiometric matrix plus catalysts by setting The "catalyst matrix" C may contain arbritrary integers c xr ≥ 0 . For the generation of a RN (X, R) , therefore, it can be drawn independently of S.
The key task of generating (X, R) is therefore the construction of an |X| × |R | integer matrix S that is conservative and thermodynamically sound. Both conditions amount to the (non)existence of vectors with certain sign patterns in ker S and ker S ⊤ , respectively. In order to obtain a background model for a given chemical RN, one might also ask for a random integer matrix that has a given left nullspace and is thermodynamically sound. In addition, one would probably like to (approximately) preserve the fraction of zero entries per row and column and the mean of the non-zero entries. To our knowledge, no efficient exact algorithms for this problem are known.
A potentially promising alternative is the independent generation of the complex matrix Y and the incidence matrix Z of the complex-reaction graph. Given a fixed conservative (28) s − xr = c xr , s + xr = c xr + s xr if s xr ≥ 0, s − xr = c xr − s xr , s + xr = c xr if s xr ≤ 0.
and thermodynamically sound RN, furthermore, one can make use of the heredity of thermodynamic soundness and conservativity and consider random subnetworks. This approach has been explored in particular for metabolic networks: The ensemble of viable metabolic networks in a given chemical RN can then be sampled by a random walk on the set of reactions [57] or a more sophisticated Markov-Chain-Monte-Carlo procedure [55,93].

Chemistry-like realizations
The structural formulas constructed in Lemma 45 are not very "realistic' from a chemical perspective. It is of interest, therefore, if one can construct chemically more appealing (multi-)graphs. As noted in the Introduction, the problem of designing a "molecular implementation" of a prescribed stoichiometric matrix S is a key problem in utilizing chemical reaction networks as computing devices. From a mathematical point of view there seem to be only a few constraints: (i) If a moiety a appears in isolation, i.e., as a molecule x = 1a , then val(a) must be even, since it contains val(a)/2 loops. (ii) The case val(a) = 1 is only possible if there is no compound composed exclusively of three or more copies of a or composed of more than two moieties with valency 1. (iii) It is well known that the sum of degrees must be even for every multigraph, and connectedness implies u val(u) ≥ 2(|V | − 1) [94]. The problem of finding multigraph realizations is closely related to, but not the same as, the problem determining the realizability of degree sequences in graphs [95] or multigraphs [96]. As in graph theory, it seems to be of particular interest to study realizability by structural formula in the presence of additional constraints on admissible graphs. Complementary to constraints on the Fig. 7 A Lewis structure-like presentation of NO 2 + NO −→ N 2 O 3 highlights that multigraphs with atom-type dependent degrees are not sufficient to represent all molecules of interest. To represent NO 2 , both an unpaired electron (shown as a semi-edge ending in a small black ball), an N atom with vertex degree 4 < val(N) = 5 and an oxygen atom with vertex degree 7 > val(O) = 6 are required. Similarly, NO is a neutral stable radical, with an unpaired electron at N. The product N2O3 has no unpaired electrons, but exhibits an O and an N atom with a deviant vertex degree and thus a net charge. Differences between nominal valency and actualy vertex degree are indicated by the charge symbols ⊕ and ⊖ . In general, the net charge at a vertex v is given by val (α(v)) − deg (v) multigraphs that render them plausible chemical graphs, the "chemical implementation" of a given S also involves constraints on the admissible (types of ) reactions, i.e., the allowed rearrangements of edges in the multigraphs. It is much less clear how to formalize this aspect, although there seems to be a connection to graph grammar models of chemical reactions [97].
An advantage of considering the multigraphs specified in Def. 42 instead of the full range of Lewis structures is that a well-established mathematical theory is available. However, "multigraphs with semi-edges", which are essentially equivalent to Lewis structures of radicals, have been studied occasionally in recent years [98,99] and may be an appealing framework, in particular, when restricted realizations are considered. The example of nitrogen oxids in Fig. 7 shows, however, that unpaired electrons (as in the Lewis structure of NO) are not the only issue. A complete implementation of Lewis structures also requires local net charges val(α(v)) − deg(v) at vertices v, as a semi-edge-like annotation distinct from unpaired electrons, see e.g. [100].

Infinite RNs
Throughout this contribution, we have assumed that (X, R) is finite. In general, however, chemical universes are infinite, at least in principle. The simplest example of infinite families are polymers. It is of interest, therefore, to develop a theory of infinite reaction networks. To this end, one could follow e.g. [101], where also infinite directed hypergraphs are considered, and further extend the literature on countably infinite undirected hypergraphs, see e.g. [102,103] and the references therein. Most previous work pre-supposed k-uniformity, i.e., hyper-edges of (small) finite cardinality, matching well with the situation in chemical RNs. Every sub-RN of an infinite RN induced by a finite vertex set Y ⊂ X can be assumed to support only a finite number of reactions (directed hyperedges) R Y ⊂ R . This amounts to assuming that a sub-RN induced by finite set of compounds Y is a finite RN. Every finite sub-RN of a "chemistry-like" infinite RN, furthermore, needs to be conservative and thermodynamically sound. Infinite RNs will not be locally finite, in general, since every compound x ∈ X may have infinitely many reaction partners, e.g., all members of a polymer family. Thus x may appear in an infinite number of reactions. These simple observations suggest infinite "chemistry-like" RNs are nontrivial structures whose study may turn out to be a worthwhile mathematical endeavor.

Appendix: Mathematical notation
We consider matrices and vectors indexed by chemical species x ∈ X or chemical reactions r ∈ R . Hence, both species and reactions can be thought of as endowed with an arbitrary, but fixed order determining the order of rows and columns. Standard mathematical notation is used without further explanation in the main text. Notation that may be less familiar is summarized here:
Additional file 1. Stoichiometric matrix of the complete formose RN, Fig. 4, in machine-readable form.